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Abstract 

Salmonella Typhi is a human restricted pathogen with a significant number of individuals as asymptomatic carriers 
of the bacterium. Salmonella infection can be effectively controlled if a reliable method for identification of these 
carriers is developed. In this context, the availability of whole genomes of carrier strains through high- throughput 
sequencing and further downstream analysis by comparative genomics approaches is very promising. Herein we 
describe the genome sequence of a Salmonella Typhi isolate representing an asymptomatic carrier individual 
during a prolonged outbreak of typhoid fever in Kelantan, Malaysia. Putative genomic coordinates relevant in 
pathogenesis and persistence of this carrier strain are identified and discussed. 



Background 

Salmonella enterica serovar Typhi, the aetiologic agent 
of typhoid fever is still posing a major health problem 
for the developing world, as about 16 million new cases 
are reported each year [1]. S. Typhi causes systemic 
infections (typhoid fever) as well as chronic infections 
(asymptomatic carriers) in humans, the latter serve as 
the source of infection [2]. The transmission of S. Typhi 
is primarily through faecal-oral route and a significant 
number of infected individuals become chronic asymp- 
tomatic carriers and keep shedding S. Typhi in faeces for 
decades [3]. This results in endemicity of S. Typhi in 
regions of the world with underdeveloped sanitation and 
community hygiene [4] . 

Carrier identification becomes extremely important as 
some of the ancestral haplotypes were observed in re- 
cent isolates suggesting their persistence in these asymp- 
tomatic carriers [5]. Traditional methods such as 
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culturing of bacteria from faecal samples are not fool 
proof as the carriers shed bacteria intermittently. Sero- 
logical tests to detect specific antibodies such as anti-H 
and anti-O are unable to differentiate between carriers 
and individuals who have recovered from the infection 
[6]. Especially, in areas endemic for S. Typhi, due to high 
background levels of these antibodies, serological tests 
cannot be adopted for the identification of a carrier [7]. 
Thus, there is an urgent need for inexpensive and effi- 
cient detection methods for the establishment of carrier 
state, perhaps based on genomic markers. 

The genetic typing tools such as PFGE, AFLP, ribotyp- 
ing etc. can resolve limited genetic variation occurring 
within specific sites, and therefore are incapable of dif- 
ferentiating highly clonal strains such as outbreak related 
strains from the ones not associated with the outbreak 
(carrier isolates) [8-10]. High-throughput sequencing 
technologies have already been employed as a high reso- 
lution molecular epidemiologic tool to discern micro- 
evolution of highly related strains [11]. 

In this study, we attempted to determine if whole gen- 
ome sequencing of S. Typhi isolated from a carrier indi- 
vidual can provide insights related to persistence and or 
adaptation mechanisms. We describe the genome 
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sequence of a Salmonella enterica serovar Typhi strain 
(ST CR0063) isolated from a carrier individual during a 
prolonged outbreak of typhoid fever in Kelantan, 
Malaysia. 

Results and discussion 

Genome statistics 

The size of the draft genome of Salmonella Typhi (ST 
CR0063) is 4,585,851 bp with a coding percentage of 
86.1%. The G + C content of this strain is about 51.71%. 
The total number of CDS determined are 4946 with 
an average length of gene about 798 nucleotides. The 
genome of ST CR0063 revealed 77 tRNA and 22 rRNA 
genes. The subsystems distribution of basic metabolic 
machinery of this strain is represented in Figure 1. The 
assembled draft genome shows high degree of similarity 
and shared core genome regions with Salmonella Typhi 
ST BL196 [12], the one identified as associated with a 
typhoid outbreak in Kelantan during the same period 
(Figure 2). 



which aid in attachment of the bacterium to intestinal 
villi and also with each other, were found in the genome. 
These adherence factors determine the sites of bacterial 
colonisation and thereby adaptation and pathogenicity of 
a particular strain [19,20]. 

The S. Typhi strain ST CR0063 genome also revealed 
viaA and viaB loci, the prime regulators of Vi antigen 
expression. The viaB locus contains all genes for the 
biosynthesis (tviA-E) and export (vexA-E) of the Vi anti- 
gen, a well-known virulence factor [21,22]. The mgtC 
gene involved in Magnesium uptake and ferric uptake 
regulators (fur) [23] were also identified in ST CR0063. 
The PhoPQ regulon [24], which induces cytokine secre- 
tion and cationic antimicrobial peptide resistance, was 
also found to be conserved in our carrier strain. The 
RpoS sigma factor needed to cope up with external 
stress and nutrient depletion conditions [25] was also 
identified and annotated. The co-ordinates of these viru- 
lence factors in the genome of ST CR0063 are depicted 
in Figure 3. 



Virulence factors 

The gene shdA, a key factor predicted to be involved in 
persistence of the bacterium in the intestines [14] by 
binding to its extracellular matrix, was identified and 
annotated. This gene, by mimicking the host heparin, is 
able to bind to the extracellular matrix proteins, fibro- 
nectin and collagen, and probably plays an important 
role in carriers by contributing to prolonged faecal shed- 
ding [15]. The fim gene cluster [16] of chaperone -usher 
family involved in adhesion to non-phagocytic cells was 
detected along with its negative regulator fimW. Type 
IV pili and agf operon [17,18] encoding curli fimbriae 



Phages and pathogenicity islands (PAIs) 

The phages gifsy-1 and fels-2 [27] together with many 
phage proteins and a few hypothetical proteins were 
identified in the genome of ST CR0063 by various algo- 
rithms (See Methods for details). It is expected that 
these phages are acquired by horizontal gene transfer 
(HGT) events as they were embedded in some of the 
genomic islands recognized. The phage encoding SopE 
effector protein of SPI-1 (Salmonella Pathogenicity Is- 
land) was present in ST CR0063 as recognized in other 
Typhi genomes [28,29]. 



Subsystem Coverage 



Subsystem Category Distribution 



Subsystem Feature Counts 

Cofactors, Vitamins, Prosthetic Groups, Pigments (341) 
Cell Wall and Capsule (289) 
Virulence, Disease and Defense (95) 
Potassium metabolism (29) 
Photosynthesis (0) 
Miscellaneous (182) 

Phages, Prophages, Transposable elements, Plasmids (49) 
Membrane Transport (179) 
Iron acquisition and metabolism (31) 
RNA Metabolism (238) 
Nucleosides and Nucleotides (112) 
Protein Metabolism (282) 
Cell Division and Cell Cycle (37) 
Motility and Chemotaxis (82) 
Regulation and Cell signaling (132) 
Secondary Metabolism (4) 
DNA Metabolism (172) 
Regulons (1564) 

Fatty Acids, Lipids, and Isoprenoids (113) 
Nitrogen Metabolism (61) 
Dormancy and Sporulation (3) 
Respiration (176) 
Stress Response (170) 
Metabolism of Aromatic Compounds (16) 
Amino Acids and Derivatives (436) 
Sulfur Metabolism (33) 
Phosphorus Metabolism (44) 
Carbohydrates (567) 

Figure 1 Subsystem distribution of ST CR0063. The subsystem statistics of ST CR0063 based on genome annotations performed according to 
RAST conventions. 
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Figure 2 Comparison of Salmonella Typhi strains ST CR0063 and ST BL196. Comparison of whole genome sequences of S. Typhi strains 
using MG-CAT - one strain was isolated from a carrier individual (ST CR0063) and another from an infected individual (ST BL196) during a 
prolonged outbreak of Typhoid fever in Kelantan [13]. 



More than 15 PAIs that encode clusters of virulence 
associated genes have been identified across various ser- 
ovars of Salmonella enterica. Ten pathogenicity islands 
have been identified by us in ST CR0063 and as 
expected [30], they were characterised by different G + C 



content and bounded by t-RNA genes. The SPI-1 type 
III secretion system (TTSS) structural genes spaM- 
NOPQRS and invABCEFGH and their regulatory pro- 
teins HilA, HilC, HilD [31] were also identified and 
annotated. The SPI-1 secreted effector proteins SopE, 
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Figure 3 Circular Genome view of ST CR063. Positions of some of the major virulence factors and their regulators identified in ST CR0063 
marked in the circular genome generated using CGview [26]. 
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SopE2, SipA, SipB, SipC and SptP required for endothe- 
lial uptake and invasion [32] are also present. The genes 
SpiC, SseF, SseG, SifA, SifB secreted by SPI-2 TTSS and 
that are needed for survival in macrophages and colon- 
isation of host organs [33] were also recognised in the 
present genome. The known regulators of SPI-2, OmpR- 
EnvZ and PhoP-PhoQ [34] were present. SPI-3, identi- 
fied by us, contained magnesium transport genes mgtC 
and marT which are required for survival in macro- 
phages [35]. Type I secretion system and its associated 
proteins encoded by SPI-4, and that are involved in the 
invasion of the intestinal epithelium [36], were also 
located in the present genome. The SPI-1 effector pro- 
teins SopB and PipB associated with enteritis and coded 
by SPI-5 [37] were also detected and annotated. 
The chaperone-usher fimbrial operons carried by SPI-6, 
SPI-10 and bacteriocin immunity proteins carried by 
SPI-8 [38] were identified. The SPI-7 and SPI-9 were 
identified in the ST CR0063 genome and were found to 
encode viaB locus, type IV pili formation proteins and 
TISS [38,39]. 

Conclusions and prospective 

The genomic blueprint of Salmonella Typhi isolate ST 
CR0063 was elucidated in this study. The genome se- 
quence information presented herein may be harnessed 
to guide comparative genomics and identification of 
novel and specific diagnostic markers. However, further 
studies involving large scale genome sequencing of 
the strains from several of the endemic countries and 
especially those from carrier individuals of different 
socio-economical settings is needed to develop a reliable 
approach to decipher the characteristics of a carrier state. 
Also, it will be required to determine the true extent of 
the diversity of carrier strains as juxtaposed to their 
acutely pathogenic forms in terms of 1) gene gain/loss 
during colonization and adaptation; 2) dynamics of viru- 
lence acquisition/attenuation; 3) possible genomic rear- 
rangements; and 4) the relative preponderance of carrier 
and virulent strains circulating in different endemic 
regions of the world. Finally, an in-depth analysis of the 
host-pathogen interactions and their influence on gut 
microbiota can only explain the adaptation and persist- 
ence mechanisms of the (asymptomatic) carrier strains. 

Methods 

Genome sequencing 

DNA was isolated from the stool sample of an asymp- 
tomatic carrier individual from Kelantan, Malaysia in 
2007 during a prolonged outbreak. The draft genome se- 
quence of this strain (STCR0063) was determined by 
Illumina Genome Analyzer (GAIIx, pipe- line ver 1.6). 
The 100 bp paired-end sequencing was done with an 



insert size of 300 bp. About 67X genome coverage was 
achieved and 1.9 gigabytes of data were obtained. 

Assembly and annotation 

The sequence data were assembled de-novo in the same 
way as described previously [40-45] into 538 contigs 
using Velvet [46] at optimal hash length 39. SSPACE 
[47] was used for scaffolding the pre-assembled contigs 
using paired-end data. The gaps within these scaffolds 
were filled using Gapfiller by aligning the reads against 
already generated Scaffolds by SSPACE [48]. 

A reference guided assembly was generated by aligning 
reads to Salmonella Typhi str. CT18 [GenBank: 
AL5 13382.1] using bwa tools [49]. This reference guided 
assembly was used to re-order the scaffolds generated in 
de-novo way. In-house written Perl scripts were used for 
this re-ordering process and to finalize the gaps. The de 
novo and reference guided approaches were used to 
finalize the consensus draft genome. The reference 
guided assembly and reordered scaffolds were loaded on 
to Tablet - NGS data visualisation tool, to visualise the 
repeats, insertions and deletions [50]. 

The final draft nucleotide sequence after manual cur- 
ation was annotated in our laboratory using RAST [51] 
and ISGA pipeline [52]. The genome statistics were 
gleaned using Artemis [53]. The data were further vali- 
dated using gene prediction tools such as Glimmer [54] 
and EasyGene [55]. The RNAmmer [56] and tRNAscan- 
SE [57] were used to identify rRNA and tRNA 
respectively. 

Phages and PAIs 

Prophages and putative phage like elements in the gen- 
ome were identified using PhiSpy [58] and Prophage 
Finder [59]. The putative HGT events were determined 
using Alien Hunter tool [60]. An integrated interface Is- 
land Viewer was used to predict putative genomic 
islands within the genome [61]. 

Sequence data access 

The Salmonella enterica subsp. enterica serovar Typhi 
str. CR0063 whole genome shotgun (WGS) project 
has been submitted to the GenBank and has the project 
accession AKIC00000000. The project version entail- 
ing draft assembly described herein has the accession 
number AKIC0 1000000, and consists of sequences 
AKIC01000001-AKIC01000538. 
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